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In this work, an improved methodology for studying in- 
teractions of proteins in solution by small-angle scattering, 
is presented. Unlike the most common approach, where the 
protein-protein correlation functions gijir) are approximated 
by their zero-density limit (i.e. the Boltzmann factor), we pro- 
pose a more accurate representation of gij{r) which takes into 
account terms up to the first order in the density expansion 
of the mean-force potential. This improvement is expected to 
be particulary effective in the case of strong protein-protein 
interactions at intermediate concentrations. The method is 
applied to analyse small angle X-ray scattering data obtained 
as a function of the ionic strength (from 7 to 507 mM) from 
acidic solutions of /3-Lactoglobuline at the fixed concentration 
of 10 gL"^ The results are compared with those obtained 
using the zero-density approximation and show a significant 
improvement particularly in the more demanding case of low 
ionic strength. 

Running Title: Interaction of proteins by SAS 
Keywords: long-range interactions, mean-force potential, 
density expansion, pair correlation functions, structure fac- 
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I. INTRODUCTION 

The study of protein-protein interactions in solution 
and the determination of both the physical origin of long 
range interactions and the geometry and energetics of 
molecular recognition can provide the most effective way 
of correlating structure and biological functions of pro- 
teins. In recent years, a large effort has been devoted 
to improve the understanding of interactions between 
macromolecules in solution. In particular, it has been 
widely recognized that the evaluation of electrostatic po- 
tentials can produce quantitative predictions and that 
factors such as self-energy, polarizability and local polar- 
ity can be biologically crucial (Halgren and Damm, 2001; 
Sheinerman et al., 2000). Nevertheless, major concep- 
tual and practical problems still exist, and concern, for 
instance, the experimental techniques required to mea- 
sure interaction potentials under physiologically relevant 
conditions, as well as the a clarification of the role of the 
solvent and of the protein shape and charge anisotropy. 

Several biophysical methods can be used for extracting 
quantitative data on protein-protein interactions, even if 



a detailed analysis of the long-range interactions has been 
so far limited to few associating colloids (Chen and Lin, 
1987; Itri and Amaral, 1991) and has usually been based 
on light scattering or osmotic stress methods (Parsegian 
and Evans, 1996). However, small angle scattering (SAS) 
is certainly the most appropriate tool for studying the 
whole structure of protein solutions, because of the small 
perturbing effects on the system and the possibility of 
deriving information on the structural properties and in- 
teractions under very different experimental conditions 
(pH, ionic strength, temperature, presence of cosolvents, 
ligands, denaturing agents and so on). 

In most analyses of SAS data, particle interactions are 
however disregarded, assuming either large separation or 
weak interaction forces. The interactions among macro- 
molecules determine their spatial arrangement, which 
can be described by correlation functions. These func- 
tions may be related, for instance via integral equations, 
to the direct pair potentials, describing the interaction 
between two particles. When the average distance among 
particles is large or the interaction potentials are weak, 
the influence of the average structure factor of the sys- 
tem (i.e. the Fourier transform of the average correla- 
tion function) may be negligible inside the considered 
experimental angular window, and the particles can be 
reckoned as completely uncorrelated. Under these con- 
ditions, the SAS intensity appears to depend only upon 
the average form factor. Note that this approximation 
of neglecting all intermolecular forces is used in most ap- 
plications of X-ray or neutron SAS (Kozin ct al., 1997; 
Chacon et al., 1998). 

When the above conditions are not verified, then par- 
ticles cannot be considered uncorrelated, and the average 
structure factor cannot be neglected in the expression of 
the SAS intensity. In this case data analysis is far more 
complicate. In principle, asymptotic behaviors could be 
used to separate the SAS intensity into (average) form 
and structure factors (Abis et al., 1990). If the par- 
ticle form factors are known, an experimental average 
structure factor can be extracted, by dividing the inten- 
sity by the average form factor. Then, some insight into 
the intermolecular forces may be obtained by compari- 
son with the theoretical structure factor calculated from 
some interaction model, by using analytical or numerical 
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methods from the statistical mechanical theory of liquids 
(Hansen and Mc Donald, 1986). 

Unfortunately, the most powerful and accurate tech- 
niques provided by this theory - such as Monte Carlo 
and molecular dynamics computer simulations as well as 
integral equations - can hardly be included into a typ- 
ical best-fit procedure for analysing experimental data. 
Working at very low concentrations, a first possibility of 
improving over the crude recipe of neglecting the average 
structure factor is to evaluate that quantity by approx- 
imating the pair correlation functions gij{r) with their 
zero-density limit, given by the Boltzmann factor (Velev 
et al., 1997). In the present paper, we shall show that this 
zero-density approximation becomes quite unusable at 
the usual protein concentrations when the ionic strength 
is low, i.e., in the presence of strong electrostatic inter- 
actions. Clearly, it would be desirable to find an alterna- 
tive, simple but reasonably accurate, way for computing 
the average structure factor of globular proteins at low 
or moderate concentrations. This is the major aim of our 
paper. 

Although the new proposal is methodological and thus 

applicable, in principle, to a wide class of spherically 
symmetric interaction models, it will be illustrated on 
a concrete case, as a part of a more general study on 
structural properties of a particular protein in solution, 
/?-Lactoglobuhn (/3LG). 

In a previous paper (Baldini et al., 1999), which pro- 
vides a natural introduction to the present work, all long- 
range protein-protein interactions were neglected and 
the average structure factor was assumed to be unity. 
That investigation reported experimental data concern- 
ing structural properties of /3LG acidic solutions (pH 2.3), 
at several values of ionic strength in the range 7-507 mM 
(Baldini et al., 1999). Photon correlation spectroscopy 
and small angle X-ray scattering (SAXS) experiments 
gave a clear evidence of a monomer-dimer equilibrium af- 
fected by the ionic strength. In the angular region where 
SAXS experiments were performed, the contribution of 
long-range protein-protein interactions was expected to 
be rather small. Accordingly, SAXS data were analysed 
only in terms of /3LG monomer and dimer form factors, 
which were calculated very accurately. Short-range forces 
responsible for protein aggregation were taken into ac- 
count only implicitly through a chemical association equi- 
librium, employed to evaluate the dimerization fraction. 
A global fit procedure allowed the determination of the 
monomer effective charge, as well as of the protein disso- 
ciation free energy within a wide range of ionic strength 
(Baldini et al., 1999). 

In the present paper, we shall investigate, within the 
same physical system, the long-range protein-protein in- 
teractions, which can strongly influence the small-angle 
scattering at low ionic strength. To this aim, two issues 
have to be addressed. First, one needs to extend the ex- 
perimental SAXS angular region to lower values of the 
scattering vector, where long-range forces play an im- 
portant role. Second, one has to select an accurate and 



tractable theoretical scheme for calculating the average 
structure factor to be used in the fit of experimental data. 
Both tasks have been accomplished in this work. 

We first report a new set of SAXS measurements on 
/3LG performed under the same experimental conditions 
of Baldini and coworkers (Baldini et al., 1999), but for 
smaller angles. These data unambiguously display a low- 
ering in the scattering intensity at small angles, with a 
progressive development of an interference peak, when 
ionic strength is low. This occurrence is a clear signal 
of strong protein-protein interactions, and we shall show 
that it can be simply interpreted in terms of screened 
electrostatic repulsions among charge macroions. 

Next, we shall propose an improvement for the calcu- 
lation of the theoretical average structure factor, based 
upon a new approximation to the protein-protein cor- 
relation functions gij{r). Starting from the density ex- 
pansion of the corresponding mean- force potentials, we 
shall show that the simple addition of the l^*-order per- 
turbative correction to the direct pair potentials leads to 
a marked progress with respect to the use of the Boltz- 
mann factor, while retaining the same level of simplicity. 
The new approximation is indeed able to predict, at low 
ionic strength, the interference peak observed in the ex- 
perimental scattering intensity, and consequently it leads 
to a significantly improved fit. 

We stress, in advance, that a check of the unavoid- 
able limits of validity of the proposed approach will not 
be treated here. A further study involving a compari- 
son with more accurate theoretical results (from Monte 
Carlo or molecular dynamics, as well as from integral 
equations) is, of course, desirable, but goes beyond the 
scope of the present paper, and will be left for future 
work. 



II. BASIC THEORY 

Because of the presence of an aggregation equilibrium, 
a /3LG solution contains two different forms of macroions 
(protein monomers and dimers) embedded in a suspend- 
ing fluid and in a sea of microions, which include both 
counter-ions neutralizing all protein charges and small 
ions originated from the addition of electrolyte salts. To 
represent such a system, we shall employ a simple "two- 
component macroion model" , which effectively takes into 
account only protein particles. Within this scheme, 
which is usually referred to as the Derjaguin-Landau- 
Vervey-Overbeek (DLVO) model (Vervey and Overbeek, 
1948), the suspending fluid (solvent) is represented as 
a uniform dielectric continuum and all microions are 
treated as point-like particles. The presence of both 
solvent and microions appears only in the macroion- 
macroion effective potentials. A further simplification 
follows from the assumption of spherically symmetric in- 
teractions. We note that in our model, component 1 and 
2 correspond to monomers and dimers, respectively. 
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Before addressing the specific system under investiga- 
tion, it is convenient to recall some basic points of the 
general theory. 



A. Scattering functions 

The macroscopical differential coherent scattering 
cross section dS/dfi, obtained from a SAS experiment, is 
related to the presence of scattering centers, i.e. density 
and/or structural inhomogeneities, and can yield quanti- 
tative information about their dimensions, concentration 
as well as shape and interaction potentials. The cross 
section is proportional to the "contrast" , namely the dif- 
ference of electron density multiplied by the classical elec- 
tron radius (or scattering length density in the neutron 
case) between the scattering centers and the surrounding 
medium; in the case of biological samples, this quantity 
can also be tuned in order to obtain more detailed in- 
formation about the scattering structures (contrast vari- 
ation technique Jacrot, 1976). Proteins in solution rep- 
resent an excellent example of inhomogeneities for SAS 
measurements, due to their high contrast with X-rays (as 
well as with neutrons). The general equation for the SAS 
intensity is 



drSp{r)e 



iQ r 



(1) 



Q being the exchanged wave vector, with magnitude 
Q — (47r/A)sin0, where A represents the incident radia- 
tion wavelength and 29 is the full scattering angle. The 
integral in Eq. [l] is extended over the sample volume V, 
with r being the position vector and Sp(r) the fluctuation 
with respect to a uniform value, po, of the local electron 
density multiplied by the classical electron radius (or sim- 
ply the scattering length density in the case of neutrons). 
Angular brackets represent an ensemble average over all 
possible configurations of the proteins in the sample. 

Eq. 1^ can be reduced to a simpler form, when the inter- 
actions are spherically symmetric. Using a "two-phase" 
representation of the fluid (only one type of homogeneous 
scattering material with scattering density pp inside pro- 
teins, embedded in a homogeneous solvent phase with 
density po) yields 



dn 



(Q) = {Ap)HY,n,V^ [<^;'(Q)>.« - <Fm>l^ 



i=l 



+ njf'^ V, <F,(Q)><,^<Fj(Q)>, 

ij = l 

where Ap = pp ~ pq represents the contrast, p the num- 
ber of protein species (2 for our solutions with monomers 
and dimers), rii the number density of species i, Vt the 
volume, Fi{Q) the form factor, Sij{Q) the Ashcroft- Lan- 
greth partial structure factor and < ... >cjq denotes an 
orientational average. 



The partial structure factors (Ashcroft and Langreth, 
1967) are defined as 



SijiQ) = Sij +47r(ni rij)^^^ / dr \gij{r) - 1] 



s\iv{Qr) 



(3) 

in terms of the three-dimensional Fourier transform of 
gij (r) — 1 , where gtj (r) is the pair correlation function (or 
radial distribution function) between particles of species 
i and j. 

Finally, the average form and structure factor, P{Q) 
and Sm{Q), are 



p(Q) = (Ap)2 j2 ^^y' <^f(Q)> 



Sm{Q)^^{Q) / P{Q). 



(4) 



(5) 



B. Protein form factors 

The angular averaged form factor of species i can be 
written as 



<F,(Q)>, 



(1) sin(Qr) 
dr pl '(r) — — , (6) 



where p^^"^ (r) represents the probability for the i-th 
species that a point at distance r from the protein center 
of mass lies inside the macromolecule. Similarly, the an- 
gular averaged squared form factor is given by (Guinier 
and Fournet, 1955) 



<^;'(Q)>. 



(2) 

where pl (r) represents the probability for the i-th 
species to find a segment of length r with both ends inside 
the macromolecule. Both integrals of p[^^ (r) and p^p (r) 
are normalized to unity. These distribution functions 
have been calculated from the crystallographic structures 
of both the monomer and dimer forms of the protein, as 
described in Refs. (Baldini et al., 1999; Mariani et al., 
2000), briefiy recalled in Appendix A, and discussed in 
Subsection III C. 
.S,,{Q)j (2) 

C. Protein-protein interaction potentials 

The choice of the proper potential is a rather delicate 
matter and depends on the investigated system. For in- 
stance, in a study on lysozyme (Kuehner et al., 1997) the 
protein-protein interaction was assumed to be the sum of 



dr pf''^ (r) 



sin((5r) 



(7) 
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four contributions, namely a hard-sphere term, an elec- 
trostatic repulsion, an attractive dispersion potential and 
a short-range attraction. In a different study, on lysozime 
and chymotrypsinogen (Velev at al., 1997) five contribu- 
tions were, on the other hand, considered: charge-charge 
repulsion, charge-dipole, dipole-dipole and van der Waals 
attraction, along with further complex short-range inter- 
actions. In this paper we follow a different route mo- 
tivated by the fact that the presence of several interac- 
tion terms may obscure the relative importance of each 
of them. Moreover, the choice of a very refined poten- 
tial would be in striking contrast with the very crude 
approximations used in calculating the RDFs. On this 
basis we shall search for the simplest possible model po- 
tential which is still capable of capturing the essential 
features of the system. It will be the sum of two repul- 
sive contributions: 



where 



-l-oo 




< r < i?H 
r>Ri + Rj 



Rj 



(8) 



(9) 



is a hard-sphere (HS) term which accounts for the 
excluded-volume effects {Ri being the radius of species 
i) and 



ZiZje'^ 



' e{l + KDRi){l + KDRj] 



exp[-K£,(r - Ri- Rj)] 



(10) 



represents a screened Coulomb repulsion between the 
macroion charges, which are of the same sign. This term 
has the same Yukawa form as in the Debye-Hiickel theory 
of electrolytes, but the coupling coefficients are of DLVO 
type (Vervey and Overbeek, 1948). Here, e is the ele- 
mentary charge, e the dielectric constant of the solvent 
and the effective valency of species i, Zi, may depend on 
the pH. The inverse Debye screening length kd, defined 
as 



Kd 



-.1/2 



{Is + Q 



(11) 



depends on temperature ( /? = (kBT)~^ ) and on the 
ionic strength of all microions. Is and Ic represent the 
ionic strength of all added salts {S) and of the counte- 
rions (c), respectively. Both these terms are of the form 
(l/2)E^c^'"°(^^"'"°)^ with c™"° = nf"°/NA being 
the molar concentration of micro- species i {Na is Avo- 
gadro's number). Ic is related to the macroion number 
densities ni and n2 (1 = monomer, 2 = dimer) through 
the elcctroneutrality condition, according to which the 
counterions must neutralize all macroion charges, i,.e. 
nc\Zc\ = ni \Z\\-\-n'2,\Z'2\. Notice that the dependence of 
Ku on Is implies that the strength of the effective poten- 
tial ufj{r) can largely be varied by adding an electrolyte 
to the solution. 



We have explicitly checked that the addition of an 
attractive term with the form of a Hamaker potential 
M^ (r) (Lsraelachvili, 1992) does not alter our final con- 
clusions. The basic reason for this can be traced back 
to the fact that van der Waals attractions may be com- 
pletely masked by u'}^ (r) , when the electrostatic repulsion 
is strong, and are also negligible for moderately charged 
particles with diameter smaller than 50 nm (Nagele, 
1996). Moreover, ufj{r) diverges a.t r = Ri + Rj, so that 
its applicability could be preserved only by the addition 
of a non-interpenetrating hydration/Stern layer (Baldini 
et al., 1999; Kuehner et al., 1997). 

We stress the fact that some attractive interactions 
must, however, be present in the system, since they are 
responsible for the aggregation of monomers into dimers, 
and determine the value of the monomer molar fraction 
Xi, which is required to complete the definition of our 
model. However, due to the complexity of these in- 
teractions (including hydrogen bonding), a clear under- 
standing of their explicit functional forms is still lacking. 
Therefore, following Baldini et al. (1999), we will account 
for them indirectly, by using a chemical association equi- 
librium to fix xi. The dissociation free energy, which 
determines the equilibrium constant, is written as a sum 
of two contributions, i.e. 



AGdis = AGel + AGnel, 



(12) 



where AGei is an electrostatic term calculated within 
a Debye-Hiickel theory, and AGnci is an unknown non- 
electrostatic contribution, which will be left as a free pa- 
rameter in the best-fit analysis. 



D. Radial distribution functions 

Given a model potential, one has to calculate the cor- 
responding radial distribution functions (RDF) gij{r), 
which can be expressed by the exact relation 



9ij {r) = exp [-(3Wij (r)] , 



- pWij (r) = -puij (r) + 0Jij{r) 



(13) 



(14) 



where Wij (r) is the potential of mean force, which 
includes the direct pair potential Uij (r) as well as 
—(}~^LOij{r), i.e. the indirect interaction between i and 
j due to their interaction with all remaining macroparti- 
cles of the fluid. In the zero-density limit, tOij{r) vanishes 
and Qij (r) reduces to the Boltzmann factor, i.e. 



9i] (r) = exp [-f3uij (r)] 



as n 



0, 



(15) 



which represents a 0*'^-order approximation, frequently 
used in the analysis of experimental scattering data (n = 
X^m ""i ^® ^^"^ total number density). 

The most common procedure for determining an accu- 
rate gij{r) or, equivalently, the correction term u)ij{r), 
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would be to solve the Ornstein-Zernike (OZ) integral 
equations of the liquid state theory, within some approx- 
imate closure relation (Hansen and Me Donald, 1986). 
This can typically be done numerically, with the excep- 
tion of few simple cases (for some potentials and peculiar 
closures) where the solution can be worked out analyti- 
cally. 

For our hard-sphere- Yukawa potential (neglecting the 
Hamaker term), the OZ equations do admit analytical so- 
lution, when coupled with the "mean spherical approx- 
imation" (MSA) (Blum and Hoye, 1978; Ginoza, 1990; 
Haytcr and Pcnfold, 1981). Nevertheless, at low density 
and for strong repulsion the MSA RDFs may assume 
unphysical negative values close to interparticle contact 
(Nagele, 1996). To overcome this difficulty, it would be 
possible to utilize an analytical "rescaled MSA" (Nagele, 
1996; Hansen and Hayter, 1982; Ruiz- Estrada et al., 1990 
) , or to resort to different closures (Rogers- Young approx- 
imation or "hypernetted chain" closure), which compel 
numerical solution (Rogers and Young, 1984; Zerah and 
Hansen, 1986; Wagner et al., 1991; Krause et al., 1991; 
D'Aguanno and Klein, 1992; D'Aguanno et al., 1992; 
Nagele et al., 1993). 

In more general, when only numerical solutions are fea- 
sible, integral equation algorithms can hardly be included 
in a best-fit program for the analysis of SAS results. The 
use of analytical solutions, or simple approximations re- 
quiring only a minor computational effort, is clearly much 
more advantageous when fitting experimental data. The 
0*'^-order approximation given in Eq. 15 avoids the prob- 
lem of solving the OZ equations, but is largely inaccurate 
except, perhaps, at very low densities. 

In order to improve over this O*''- order approximation 
to the RDFs, the basic idea put forward in the present 
work hinges upon the expansion of the potential of mean 
force into a power series of the total number density n 
(Meeron, 1958). Neglecting all terms beyond the first 
order, Eq. O then becomes 



9tj (r) = exp -I3u,j (r) -I- uj'l]\r)n 



(16) 



By construction, this expression is never negative, thus 
avoiding the major drawback of MSA. The explicit ex- 
pression for the perturbative correction ujlj\r) is given 
in Appendix B. The considered f^'-order approximation 
substantially improves the accuracy of the RDFs with 
respect to Eq. |l5|, while remaining at nearly the same 
level of simplicity (see Appendix B). Moreover, it is to 
be stressed that the usage of the new approximation is 
not restricted to the model of this paper, but the pro- 
posed calculation scheme can be equally well applied to 
different spherically symmetric potentials. 



III. MATERIALS AND METHODS 



A. Samples 

A bovin milk [3LG B stock solution (concentration 40 
gL~^) was obtained by ionic exchange of protein samples 
against a 12 mM phosphate buffer (ionic strength Is = 
7 mM and pH = 2.3) (Baldini et al., 1999). Nine samples 
at ionic strength 7, 17, 27, 47, 67, 87, 107, 207, 507 mM 
were then prepared by adding appropriate amounts of 
NaCl. The final protein concentrations were about 10 

The monomeric /3LG unit is composed by 162 ammi- 
noacid residues and has a molecular weight of 18400 Da. 
The excluded protein volume has been calculated from 
the amino acid volumes, as reported by Jacrot and Zaccai 
(Jacrot, 1976; Jacrot and Zaccai, 1981). The monomer 

volume results to be Vi — 23400 A'^; hence, the /3LG 
electron density is pp = 0.418 eA"-^. By considering the 
basicity of the amino acids, at pH = 2.3 the monomer 
charge would be near 20e. This result is confirmed by 
the Gasteiger- Marsili method (Gasteiger and Marsili, 
1980), assuming that all amino groups NH2 are protoned 
at pH — 2.3. The crystallographic structure of /?LG both 
in monomer and in dimer form can be found in the Pro- 
tein Data Bank, entry 1QG5 (Oliveira et al., 2001). A 
sketch of /3LG dimer structure can be found in Fig. 1 of 
Ref. (Baldini et al., 1999). It can be observed that all 20 
basic amino acids are on the protein surface, but two of 
them are at the monomer-monomer interface; therefore 
at pH = 2.3 the ratio Z2/Z1 between dimer and monomer 
charges could be about 1.8. 



B. SAXS experiments 

SAXS measurements were collected at the Physik De- 
partment of the Technische Universitat Miinchen (Ger- 
many) using a rotating-anode generator. The radiation 
wavelength was A = 0.71 A and the temperature 20°C. 
The Q range was 0.035 — 0.1 A . /3LG samples were 
measured in quartz capillaries with a diameter of 2 mm 
and a thickness of 10 fim (Hilgenberg, Malsfeld, D). X- 
ray patterns were collected by a two-dimensional detector 
and radially averaged. The scattering from a solvent cap- 
illary was subtracted from the data after correction for 
transmission, capillary thickness and detector efficiency. 



C. Best-Fit analysis 

A previous analysis of SAXS data for similar samples in 
the range Q = 0.07 0.3 A has been recently reported 
by some of us (Baldini et al., 1999). In the present work 
we have extended these experiments to the range Q = 
0.035 0.1 A , where protein-protein interactions are 
expected to play a major role. The two sets have then 
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been combined into a single set of measurements with Q 
ranging from 0.035 to 0.3 A 

As regards the calculation of the monomer and dimer 
form factors, it is well known that the scattering form 
factor of a biomolecule in solution depends on the crys- 
tallographic coordinates and the form factors of all con- 
stituent atoms, as well as on the hydration shell of the 
resulting macroparticle. Computer programs such as 
CRYSOL (Svergun et al., 1995) are able to calculate 
such a form factor, taking all the above-mentioned vari- 
ables into account. It is also widely accepted that the 
SAS technique is a low-resolution one, and approximat- 
ing the /3LG protein by a homogeneous scattering parti- 
cle yields comparable results up to Q = 0.4 A , as we 
have tested by checking our method against the results 
of the CRYSOL software. The equivalent homogeneous 
scattering particle has a shape defined by the envelope of 
the van der Waals spheres centered on each atom. The 
SAS community often exploits the Monte Carlo method 
to calculate the form factor of a given shape (Hender- 
son, 1996). We have modeled the hydration shell with a 
semigaussian function, instead of a linear one proposed 
by Svergun (Svergun et al., 1997). Our simple and ef- 
ficient method has already been applied with success in 
previous works (Baldini et al., 1999; Mariani et al., 2000). 

The Monte Carlo method used to calculate the distri- 
bution functions p^i'\r) and pf'\r) of both monomers 
(« = 1) and dimers (i = 2) from their crystallographic 
structures is outlined in Appendix A. Then the form fac- 
tors < -Fi(Q) >ujQ and < F-^(Q) >uq have been obtained 
through Eqs. ^ and R by calculating the radial integrals 
with a grid size of 1 A up to a maximum r corresponding 
topW(r) = 0, (i = 1,2). 

According to the dissociation free energy model de- 
scribed in Ref. (Baldini et al., 1999), the monomer molar 
fraction xi is a function of the ionic strength Is- This 
suggests the possibility of a simultaneous fit for all SAXS 
intensities curves, using just few parameters, all indepen- 
dent of Is- In particular, as in Baldini et al. (Baldini et 
al., 1999), the following parameters have been fixed: the 
dielectric constant of the solvent, e = 78.5; the experi- 
mental temperature, T — 293 K; the ratio between the 
effective charges of dimer and monomer, Z2/Z1 — 1.8; 
the monomer and dimer "bare" radii, i?i = 19.15 A and 
i?2 — 2^/'^i?i. The choice for R2 is easily understood if we 
recall that our model of long-range interactions involves 
the approximation of considering a dimer as a sphere 
with volume twice as large as the monomer one. This 
introduction of an equivalent sphere is a simplifying ap- 
proximation often used by the SAS community. On the 
other hand, we have calculated the form factor of the 
dimer from its exact, rather elongated form. 

In the global fit the only free parameters are therefore 
Zi and AGnei, the non-electrostatic free energy. The 
merit functional to be minimized was defined as 




[d^/dnrj{Q,) - K^[<m/dn]f^\Q,) - b 
<ym{Qi) 



where Ns is the number of scattering curves under anal- 
ysis, Nq^m is the number of experimental points in the 
TO— th curve, and (TmiQi) is the experimental uncertainty 
on the intensity value at Qi. [dS/dil]^*(Qi) is the corre- 
sponding cross section predicted by the model by using 
Eq. ^ for each experiment, the calibration factor Km and 
the flat background Bm have been adjusted from a linear 
least-squares fit of [dT, / dfl]'^^ {Q) . The partial structure 
factors, Eq. have been calculated with an integration 
upper limit of r — 500 A and a grid size of 1 A. 

The physical meaning of the "flat background" re- 
quires a comment, since constant subtraction is usually 
accepted for neutron scattering, but not for X-ray scat- 
tering. Introducing these backgrounds is suggested by 
observing that one of major experimental problems with 
X-rays is the exact determination of the transmission fac- 
tor. A non-exact value would result into a non-perfect 
subtraction of the background due to the electronic noise. 
However, as shown later in Table II, the low values ob- 
tained for Bm , as compared to the values of the scaling 
factors, indicate that these parameters play a minor role 
in the data analysis. 

Typical calculation times for the best-fit on a Digital 
Alpha 433 are a few minutes for the 0"^-order approxi- 
mation and ~ 20 hours for the f^'-order one. The effect 
of experimental errors on the fitting parameters has been 
determined using a sampling method. For each scatter- 
ing curve, we start from Nq^^ intensities [dJ^/dfl\'^^(Qi) 
with their experimental standard deviation and we gen- 
erate Ni new data sets (for /3LG we used Nj = 15) by 
sampling from Ng^rn gaussians of width am{Qi) centred 
at the observed values. Each data set generated for all 
curves is then analyzed with the global fit algorithm de- 
scribed earlier. The errors on the fitting parameters, Zi 
and AGnci, and on the scaling parameters, k„i and Bm, 
are obtained by calculating their values from each data 
set and, finally, their standard deviation from the first 
value. 



IV. RESULTS AND DISCUSSION 

Fig. |l| depicts the experimental results for the X-ray 
intensity [dT, / dQ](Q) as a function of the transferred mo- 
mentum Q at several values of ionic strength. Here, in- 
stead of the usual logarithmic scale, we have preferred the 
use of a linear scale, in order to let the reader appreciate 
more easily the small differences between experimental 
data and theoretical curves. On a log scale these differ- 
ences would be hardly visible. 
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Our measurements clearly show the formation and evo- 
lution of an interference peak at small angles, as the ionic 
strength decreases. The appearance of such a peak is 
evidently due to increasing protein-protein interactions. 
In the same figure, the performance of our f^'-order ap- 
proximation is compared with that of the commonly used 
O'h.Qrder one. The 1^'- order approximation yields a fit 
of rather good quality through the whole measured range 
Q. The development of the interference peak, underes- 
timated by the O^'^-order approximation, is now well re- 
produced, indicating that the main physical features of 
the (JLG solution are indeed taken into account by our 
simple interaction model. 

In Fig. H the theoretical results for the average struc- 
ture factor Sm{Q) are shown along with the experimental 
data. While at high 1$ (i.e. at weak effective interac- 
tions) the two approximations are practically undistin- 
guishable, for Is < 27 mM the l^'-order results outplay 
the 0"^-order ones, mainly in the low- Q region. 

A more transparent comparison between the two ap- 
proximations is carried out in Fig. || at the level of RDFs. 
As Is decreases, the l^'-order gij{r) = 1,2) become 
strongly different from the 0*'^-order ones, exhibiting a 
peak of increasing height. In terms of potentials of mean 
force, gij (r) > 1 in some regions (mainly for Is < 27 mM) 
implies that Wij{r) < 0, although Uij{r) always remains 
positive. The first-order correction ujlj\r)n therefore 
corresponds to an attractive contribution, due to an "os- 
motic depletion" effect (Asakura and Oosawa, 1954) ex- 
erted on two given macroparticles by the remaining ones. 
This many-body effect is clearly lacking in the O^'^-order 
approximation, as depicted in Fig. ^. Depletion forces 
arise when two protein molecules are close together. In 
this case the pressure exerted on these molecules by all 
other macroparticles becomes anisotropic, leading to a 
strong indirect protein-protein attraction, even though 
all direct interactions are repulsive. 

It is worth stressing that the behavior of the l^'-order 
gij{r) at low ionic strength could be reproduced even 
by the 0*''-order approximation, but only at the cost 
of adding some unnecessary, and somewhat misleading, 
density-dependent attractive term to the direct pair po- 
tentials. Our model, based only on the physically sound 
repulsive part of the DLVO potential, turns out to be 
rather accurate for the purposes of the present paper. 
We have also performed some calculations including a 
Hamaker term into our perturbative scheme, without 
finding any significative change in the f'^-order results 
with respect to the previous ones. 

The I'^'-order RDFs shown in Fig. ^ are undoubtedly 
correctly shaped, although the peak heights might be 
modified by the neglected second- and higher-order cor- 
rections to the potentials of mean force. Unfortunately, 
an estimate for the magnitude of the successive perturba- 
tive terms (depending on both concentration and charge 
of the protein molecules) is a far more complicate task 
and goes beyond the scope of the present paper. Since 



the resulting protein charges (see Table I) are relatively 
large, it is reasonable to expect that the contribution 
of the higher-order terms might be appreciable. As the 
protein concentration increases, this correction becomes 
more and more significant, and eventually the rather 
good performance of our P*-order approximation must 
break down. 

Since a direct computation of even the second order 
corrections demands a high computational effort, the ac- 
curacy of the I'^'-order approximation may alternatively 
be investigated by checking our RDF results against ex- 
act Monte Carlo or molecular dynamics simulation data 
relevant to the same model. A simpler indication about 
the limits of validity of our scheme may come from a 
systematic comparison with integral-equation predictions 
based upon more accurate closures. One could use, for 
instance, the multi-component version of the "rescaled 
MSA" approach (Ruiz-Estrada et al., 1990), which has 
the advantage of being nearly fully analytical. On the 
other hand, if more accurate results are required, then 
the Rogers- Young closure (Rogers and Young, 1984)is 
preferable for our potential, but in this case the corre- 
sponding integral equations must be solved numerically. 
We have planned some investigations in this sense, and 
their results will be reported elsewhere. However, we be- 
lieve that, at the considered protein concentration, the 
T'^-order approximation does yield the correct trend of 
the RDFs. It is our opinion that the inclusion of the 
neglected terms cannot alter the qualitative (or semi- 
quantitative) picture of /3LG interactions supported by 
our model, even if slightly different values for the best-fit 
parameters should be expected. 

The parameter values resulting from the global best-fit 
procedure, using the O^'^-ordcr and P*-order approxima- 
tions, are reported in Tabs. | and ||. 

The improved quality of the fit corresponding to the 
first-order approximation can clearly be appreciated by 
comparing not only the global value (Table I), but 
above all the partial Xm ones (Table II), in particular 
for Is < 27 mM. Although the change of global is 
not so large, if one considers the relative variation of the 
Xm's (last column of Table II), then the improvement is 
rather evident for the low ionic strength samples, while 
it becomes less and less important with increasing ionic 
strength. The proposed method is able to improve the 
goodness of the fit by about 43% for the first sample 
(where the interference peak is more pronounced). The 
decrease of the relative variation, as the ionic strength 
increases, is in agreement with the expected progressive 
weakening of protein-protein repulsions. 

Note that the values of both fitting parameters, i.e. 
Zi and AGnci, turn out to be very similar for both ap- 
proximations. The scaling factors, Km, and the flat back- 
grounds, Brrn are also similar for all samples and for both 
approximations, confirming that no other effects, like de- 
naturation or larger aggregation, are really present. 
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V. CONCLUSIONS 

In this paper we have presented a novel methodologi- 
cal approach to the study of protein-protein interactions 
using SAXS techniques. Our work builds up upon a pre- 
vious investigation by some of us (Baldini et al., 1999). 

As widely discussed by Baldini et al., 1999, the struc- 
tural properties of /3LG in acidic solution, studied by 
light and X-ray scattering over a wide range of ionic 
strength and concentration, are consistent with the exis- 
tence of monomers and dimers, and cannot be ascribed 
to a denaturation process. 

Since the form factors of both the species are eas- 
ily known, the so-called "measured" or average struc- 
ture factor Sm{Q) can be obtained from the ratio be- 
tween experimental intensity and average form factor 
P{Q) at a certain monomer fraction xi. Sm{Q) is re- 
lated to the protein-protein effective interactions. Short- 
range attractive interactions like hydrogen bonds, re- 
sponsible of the dimer formation and strongly depending 
on the monomer-monomer orientation, are taken into ac- 
count using a quasi-chemical description of the thermo- 
dynamic equilibrium between monomer and dimer forms 
of /3LG. Thus, in addition to the hard core repulsions, 
the effective potentials of mean force only describe long- 
range monomer-monomer, monomcr-dimer and dimer- 
dimer electrostatic repulsions, which can be reduced to 
their orientational averages, depending only on the inter- 
molecular distance r. 

In the work by Baldini et al., 1999 all long-range 
protein-protein forces were neglected, because the mea- 
sured SAXS intensity was spanning a Q-range where such 
interactions are essentially negligible. On the contrary, 
we have explicitly addressed this issue in the present 
work. To this aim, i) we have extended the range of 
measured intensities to lower Q values in order to exper- 
imentally probe these long-range interactions, and ii) we 
have proposed a simple but efficient perturbative scheme, 
whose first terms are able to yield reasonably accurate 
RDFs for dilute or moderately concentrate solutions of 
globular proteins, with a rather little computational ef- 
fort. In particular, we have explicitly computed the O"^ — 
and I'^^-order approximations and compared their results. 

The improvement in the quality of the fit for Sm{Q), 
obtained with the first-order correction for the potentials 
of mean force corresponding to the RDFs, with respect to 
the standard zero-density approximation, is particularly 
visible at low ionic strength, where Coulomb repulsions 
are poorly screened. In this case, the new representation 
of the RDFs is able to reproduce the interference peak 
present in the experimental Sm{Q), whereas the com- 
monly used zero-density approximation turns out to be 
quite inadequate at low ionic strength. 

Finally, two points are particularly noteworthy. 

First, the adopted model allows a simultaneous fit of 
nine SAS curves with only two free parameters, inde- 
pendent of the ionic strength, i.e., the non-electrostatic 



dissociation free energy and the monomer charge. This 
finding means that our simple interaction model is al- 
ready able to describe the main structural features of the 
examined /3LG solutions. Satisfactory results obtained 
by many other structural studies on colloidal or pro- 
tein solutions, based upon similar very simplified models 
(Wagner et al., 1991; Krause et al., 1991; D'Aguanno 
and Klein, 1992; D'Aguanno et al., 1992; Nagele et al., 
1993; Wanderlingh et al., 1994), suggest that the use of 
very refined potentials, containing a large number of dif- 
ferent contributions, is often unnecessary, at least at the 
first stages of a research. Using sophisticated interac- 
tion models may even be a nonsense, when coupled with 
a simultaneous very rough treatment of the correlation 
functions, as is often the case with the widely employed 
O'^'-order approximation, in spite of the fact that the in- 
troduction of a larger number of parameters can clearly 
improve the actual fitting of the data. Moreover, we have 
pointed out that, even in models with purely repulsive in- 
teractions, attractive effects (due to "osmotic depletion") 
are predicted by every sufficiently accurate theory. On 
the contrary, within the zero-density approximation for 
the RDFs, the same attractive effects may be reproduced 
only at the cost of adding artificial contributions to the 
potentials. 

Second, the proposed l^*-order approximation to the 
RDFs is really able to yield accurate predictions for the 
average structure factor of weakly-concentrated protein 
solutions, in a rather simple but physically sound way. It 
is worth stressing that the underlying calculation scheme 
is not restricted to the particular model considered in this 
paper, but may be easily applied to different spherically 
symmetric potentials. Although the limit of validity of 
the I''*- order approximation is still an open question, 
which we are planning to investigate in future work, we 
think that it may represent a new useful tool for the anal- 
ysis of experimental SAS data of globular protein solu- 
tions, when their concentration is not too high and the 
strength of their interaction forces is not too large. When 
these two conditions fail, then it is unavoidable to com- 
pute the correlation functions by exploiting some more 
powerful method from the statistical mechanical theory 
of liquids (Hansen and Mc Donald, 1986). We hope, how- 
ever, that this paper will stimulate the application of the 
proposed l^'-order approximation to different sets of ex- 
perimental data on proteins, as well as new theoretical 
work on the quality and limit of this calculation scheme. 
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APPENDIX A: CALCULATION OF PROTEIN 
FORM FACTORS 

In detail, the scattering particle is assumed to be ho- 
mogeneous and its size and shape are described by the 
function s(r), which gives the probability that the point 
r = (r,ujr) (where w,. indicates the polar angles and 
Pr) lies within the particle. For compact particles, like 
globular proteins, this function can be written in terms of 
a unique two- dimensional angular shape function J-'(LUr), 
as 



"^•JH (Al) 



where cr is the width of the gaussian that accounts for 
the particle surface mobility (Svergun et al., 1998). The 
shape function TiuJr) is evaluated by fixing the axis origin 
on the mean value of the atomic coordinates and running 
over each atom m and taking the maximum distance r 
between the origin and the intersection, if any, of the van 
der Waals sphere centered in m with the direction lo^. 
Assuming homogeneous particles belonging to species i. 
Mi random points are generated from polar coordinates. 
The sampling is made for the variables , cos /3r and 
in the ranges [0, 27r], [—1,1] and [0, r^^^.], respectively. 
Following Eq. Al, if r < T(uJr), the point is accepted. 



otherwise the probabihty V = exp{ — [r — jr(wr)]2/2CT^} 
is calculated. A random number y between and 1 is 
extracted and \i y <V the point is accepted, otherwise 



- /3Wi, (r) = (r) + ujfhr)n + ojf {r)n^ + • ■ 



(Bl) 



the exact power coefficients uj'^ (r) ( k = 1,2,...) can 
be computed by using standard diagrammatic techniques 
(Meeron, 1958), which yield the results in terms of appro- 
priate multi- dimensional integrals of products of Mayer 
functions 



h ( r) = exp [-/3uy (r)] - 1 



(B2) 



Within our approximation, we are only required to 
compute the first term, which involves a convolution and 
turns out to be 



,(1) 



(r) 



k 



,.(/) A,(|r-r'|) 



(B3) 



where Xk — nk/n is the molar fraction of species k. The 
evaluation of the convolution integral l\^\{r) is not a 
difficult task in bipolar coordinates. Integration over an- 
gles is easily performed and ^^^\{'r) reduces to a double 
integral, which can be written as 



27r 



dx [xfik {x)] 



x-\-r 



x — r\ 



dy [yfk,{y)]. (B4) 



is rejected. The pl^'ij) histogram is then determined by 
taking into account the distances between the Mi points 

(2) 

and the centre, while the p] (r) histogram depends on 
the distances between all possible pairs of Mi points. 



We have evaluated all these "Ylj\ir) terms at the points 

ri = iAr {i = 1, . . . , 500), with Ar = 1 A. At each 
value, the double integral has been carried out numeri- 
cally, simply by using the trapezoidal rule for both x— 
and y- integration. For the x-integration, we have chosen 
as upper limit the value Xmax = niax(a;cut, R2 + r), with 
^A3)= R2 + 12/kd (depending on the ionic strength), 
and as grid size Ax — a;cut/200. For the y-integration. 
Ay = Ax. 

n E E ^(Ar/2-|r-r„„,|), (A3) 



p,f)(r) 



ArM,{M, 



Mi-l Mi 



n—1 m— n+1 



where Ar is the grid amplitude in the space of radial 
distance, r„ the distance between the centre and the n- 
th point. Here r„m is the distance between the points n 
and TO, and H{x) is the Heaviside step function {H(x) — 
if X < and H{x) — 1 if x > 0). The number of 
random scattering centres was Ali — 2000, the grid size 
was Ar = 1 A, while the width of the surface mobility 
was fixed to = 2 A. 



APPENDIX B: FIRST-ORDER PERTURBATIVE 
CORRECTIONS 

In the density expansion of the potentials of mean force 
Wij (r) 
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FIG. 1. SAXS linear profiles for the /?LG at pH=2.3 and 
concentration 10 gL^^ in different ionic strength conditions 
(as indicated above each curve). Points are experimental re- 
sults, whereas the dashed and the solid lines represent the best 
fits obtained by applying the 0*''-order and 1"*- order approx- 
imations of the pair correlation functions, respectivley. The 
curves are scaled for clarity by a factor 0.5. 
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FIG. 2. Comparison between the measured structure fac- 
tors Sm{Q) for the /3LG at pH=2.3 and concentration 
10 gL~^ in different ionic strength conditions (as indicated 
above each curve) . The best fit lines resulting from the simul- 
taneous analysis of the corresponding SAXS curves (Fig. ^) 
using the O^^'-order (dashed) and f'^-order (solid) approxima- 
tions of the pair correlation functions are reported. Data for 
Q > 0.12 A~ are not shown for clarity. 
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FIG. 3. Partial correlation functions gij{r) resulting from 
the simultaneous analysis of the nine SAXS curves of Fig. ^ 
(the ionic strength, Is, is indicated near each set of curves) 
by applying the 0*'^-order (left column) and l°*-order (right 
column) approximation in the density expansion of the 
mean-force potential. Depicted are the monomer-monomer, 
guir) (dotted lines), the monomer- dimer g\2(r) (dashed 
lines) and the dimer-dimer (?22(r) (solid line) correlation func- 
tions. 
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19.6 ±0.1 


14.8 ±0.1 
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20.0 ±0.2 


16.6 ±0.1 
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TABLE I. Comparison of the fitting parameters (the 
monomer effective charge, Zi, and the non-electrostatic free 
energy, AGnei) and of the merit functional resulting from 
the simultaneous analysis of the nine SAXS curves of Fig. |l| 
by applying the 0*''-order and 1"*- order approximations of 



the pair correlation functions. 
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